Goto

Collaborating Authors

 update frequency


STEM: A Stochastic Two-Sided Momentum Algorithm Achieving Near-Optimal Sample and Communication Complexities for Federated Learning

Neural Information Processing Systems

Federated Learning (FL) refers to the paradigm where multiple worker nodes (WNs) build a joint model by using local data. Despite extensive research, for a generic non-convex FL problem, it is not clear, how to choose the WNs' and the server's update directions, the minibatch sizes, and the local update frequency, so that the WNs use the minimum number of samples and communication rounds to achieve the desired solution. This work addresses the above question and considers a class of stochastic algorithms where the WNs perform a few local updates before communication. We show that when both the WN's and the server's directions are chosen based on certain stochastic momentum estimator, the algorithm requires $\tilde{\mathcal{O}}(\epsilon^{-3/2})$ samples and $\tilde{\mathcal{O}}(\epsilon^{-1})$ communication rounds to compute an $\epsilon$-stationary solution. To the best of our knowledge, this is the first FL algorithm that achieves such {\it near-optimal} sample and communication complexities simultaneously. Further, we show that there is a trade-off curve between local update frequencies and local minibatch sizes, on which the above sample and communication complexities can be maintained.


A Supplementary Material

Amelia Jimenez Sanchez

Neural Information Processing Systems

On HuggingFace, we find information about the annotation creators ( e.g., crowdsource, experts, ml-generated) or specific task categories ( e.g., image-classification, image-to-text, text-to-image). Kaggle automatically computes a usability score, which is associated with the tag "well-documented", Kaggle's usability score is based on: Completeness: subtitle, tag, description, cover image . Credibility: provenance, public noteboook, update frequency . Compatibility: license, file format, file description, column description . The usability score is based on only 4 out of 7 aspects from Datasheets [40].


LLM-based Multi-Agent System for Simulating Strategic and Goal-Oriented Data Marketplaces

Sashihara, Jun, Fujita, Yukihisa, Nakamura, Kota, Kuwahara, Masahiro, Hayashi, Teruaki

arXiv.org Artificial Intelligence

Abstract--Data marketplaces, which mediate the purchase and exchange of data from third parties, have attracted growing attention for reducing the cost and effort of data collection while enabling the trading of diverse datasets. However, a systematic understanding of the interactions between market participants, data, and regulations remains limited. T o address this gap, we propose a Large Language Model-based Multi-Agent System (LLM-MAS) for data marketplaces. In our framework, buyer and seller agents powered by LLMs operate with explicit objectives and autonomously perform strategic actions, such as--planning, searching, purchasing, pricing, and updating data. These agents can reason about market dynamics, forecast future demand, and adapt their strategies accordingly. Unlike conventional model-based simulations, which are typically constrained to predefined rules, LLM-MAS supports broader and more adaptive behavior selection through natural language reasoning. We evaluated the framework via simulation experiments using three distribution-based metrics: (1) the number of purchases per dataset, (2) the number of purchases per buyer, and (3) the number of repeated purchases of the same dataset. The results demonstrate that LLM-MAS more faithfully reproduces trading patterns observed in real data marketplaces compared to traditional approaches, and further captures the emergence and evolution of market trends. Data have emerged as a tradable economic resource, and data marketplaces that mediate the purchase and exchange of datasets from third parties have rapidly expanded [1]. These marketplaces streamline data collection that previously required substantial cost and effort, while also providing organizations and researchers with access to diverse, high-quality datasets. As a result, they are increasingly recognized as critical infrastructures that accelerate innovation based on data that were closed within individual organizations [2]. Despite this progress, our understanding of how interactions among market participants, data, and regulations shape market dynamics remains limited. Smooth and efficient data transactions require well-designed and robust data marketplaces [3].


812214fb8e7066bfa6e32c626c2c688b-Paper.pdf

Neural Information Processing Systems

In this work, we argue that the order of play in strategic classification is fundamentally determined by the relative frequencies at which the decision-maker and the agents adapt to each other's actions.



A Supplementary Material

Amelia Jimenez Sanchez

Neural Information Processing Systems

On HuggingFace, we find information about the annotation creators ( e.g., crowdsource, experts, ml-generated) or specific task categories ( e.g., image-classification, image-to-text, text-to-image). Kaggle automatically computes a usability score, which is associated with the tag "well-documented", Kaggle's usability score is based on: Completeness: subtitle, tag, description, cover image . Credibility: provenance, public noteboook, update frequency . Compatibility: license, file format, file description, column description . The usability score is based on only 4 out of 7 aspects from Datasheets [40].




812214fb8e7066bfa6e32c626c2c688b-Paper.pdf

Neural Information Processing Systems

In this work, we argue that the order of play in strategic classification is fundamentally determined by the relative frequencies at which the decision-maker and the agents adapt to each other's actions.


Mechanistic Insights into Grokking from the Embedding Layer

AlquBoj, H. V., AlQuabeh, Hilal, Bojkovic, Velibor, Nwadike, Munachiso, Inui, Kentaro

arXiv.org Artificial Intelligence

Grokking, a delayed generalization in neural networks after perfect training performance, has been observed in Transformers and MLPs, but the components driving it remain underexplored. We show that embeddings are central to grokking: introducing them into MLPs induces delayed generalization in modular arithmetic tasks, whereas MLPs without embeddings can generalize immediately. Our analysis identifies two key mechanisms: (1) Embedding update dynamics, where rare tokens stagnate due to sparse gradient updates and weight decay, and (2) Bilinear coupling, where the interaction between embeddings and downstream weights introduces saddle points and increases sensitivity to initialization. To confirm these mechanisms, we investigate frequency-aware sampling, which balances token updates by minimizing gradient variance, and embedding-specific learning rates, derived from the asymmetric curvature of the bilinear loss landscape. We prove that an adaptive learning rate ratio, \(\frac{η_E}{η_W} \propto \frac{σ_{\max}(E)}{σ_{\max}(W)} \cdot \frac{f_W}{f_E}\), mitigates bilinear coupling effects, accelerating convergence. Our methods not only improve grokking dynamics but also extend to broader challenges in Transformer optimization, where bilinear interactions hinder efficient training.